CONCENTRATION OF MEASURE AND MIXING FOR 
MARKOV CHAINS 



MALWINA J. LUCZAK 

Abstract. We consider Markovian models on graphs with local dy- 
namics. We show that, under suitable conditions, such Markov chains 
exhibit both rapid convergence to equilibrium and strong concentration 
of measure in the stationary distribution. We illustrate our results with 
applications to some known chains from computer science and statistical 
mechanics. 



1. Introduction 

Recent years have witnessed a surge of activity in the mathematics of 
real-world networks, especially the study of combinatorial and stochastic 
models. Such networks include, for instance, the Internet, social networks, 
and biological networks. The techniques used to analyse them draw from 
a range of mathematical disciplines, such as graph theory, probability sta- 
tistical physics, analysis. Strikingly, random processes with rather similar 
characteristics can occur as models of very different real- world settings. 

Random networks can often be regarded as interacting systems of individ- 
uals or particles. Under certain conditions, there is a law of large numbers, 
that is, a large system is close to a deterministic process solving a differen- 
tial equation derived from the average 'drift', with much simpler dynamics. 
Further, one may frequently observe chaoticity, i.e. asymptotic approximate 
independence of particles. Unfortunately it is often difficult to prove the 
validity of such approximations, especially when the random process has an 
unbounded number of components in the limit (e.g. the number of vertices 
or components of size A: in a graph of size n, for k = 1, 2, . . ., as n — ► oo). 

In other instances, it may be difficult to establish good rates of conver- 
gence for mean-field approximations, or determine whether the long-term 
and equilibrium behaviour of the random process also follows that of the 
deterministic system. Furthermore, some recent attempts at a more accu- 
rate representation of real networks still await any kind of mathematically 
rigorous analysis. We would hope that over the coming years, the intense 
interest will produce a coherent and widely applicable theory. However, at 
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present, it often appears that each new problem defies the existing theory 
in an interesting way. 

In many complex systems, laws of large numbers and high concentration 
of measure in equilibrium have been found to co-exist with so-called rapid 
mixing 0; 13; HI], that is mixing in time O(ralogn), where n is a measure of 
the system size. (Traditionally, such a system was considered to be rapidly 
mixing if it converged to equilibrium in a time polynomial in n, but currently 
the term is more and more restricted to the 'optimal' mixing time 0{n log n), 
see for example [3ll : 0|.) There are some very notable examples of such 
behaviour, for instance, the subcritical Ising model, see [3; EH and references 
therein, as well as the discussion in Section 13.11 of this paper. 

The purpose of this article is to propose a new method to establish con- 
centration of measure in complex systems modelled by Markov chains. We 
illustrate the technique with an application to a balls-and-bins model anal- 
ysed in some earlier works by this author and McDiarmid, the supermarket 
model [H; Strong concentration of measure for this model, over long 
time intervals starting from a given state, as well as in equilibrium, was 
established in [l^; using the underlying structure of the model that en- 
abled certain functions to be considered as functions of independent random 
variables so that the bounded differences method could be used. 

In Section 0] of the present article we show that such concentration of 
measure inequalities hold more generally, with fewer assumptions on the 
structure of the Markov process involved. Our result is somewhat related, in 
spirit, to results (and arguments) in [16], which establishes transportation 
cost inequalities for the measure at time t and the stationary measure of 
a contracting Markov chain, assuming transportation cost inequalities for 
the kernel. However, the technical approach adopted here is rather different 



from 16|] - discrete and coupling-based rather than functional analytic, and, 



we think, more 'hands on' and easier to use in practice (though our setting is 
less general than in 0). It is striking that our approach, considerably more 
general than the one taken i n [la| , enables us to improve on the concentration 
of measure results proved in [18[] . (Accordingly, we could also prove improved 
versions of results in [19[], but we choose not to pursue this here.) The results 
in Section [4] also significantly extend Lemma 2.6 in 15[], which bounds the 
variance of a real-valued, discrete-time, contracting Markov chain at time t 
and in equilibrium. We hope many more applications for the ideas presented 
here will be found in the future. 



2. Notation and definitions 

Let X = (Xt) t £%+ be a discrete-time Markov chain with a discrete state 
space S and transition probabilities P(x, y) for x,y £ S, where Ylyes P( x i v) = 
1 for each x £ S. We assume that, for every pair of states x,y G S, 
P(x, y) > if and only if P(y, x) > 0. Then we can form an undirected 
graph with vertex set S where {x, y} is an edge if and only if P(x, y) > and 
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x ^ y. In general, our chains may be lazy, that is we can have P(x, x) > 
for some x G S. We assume that the graph is locally finite, that is, each 
vertex is adjacent to only finitely many other vertices. We now endow S 
with a graph metric d given by d(x, y) = 1 if P(x, y) > and x ^ y, and, 
for all other x, y, d(x, y) the length of the shortest path between x and y in 
the graph, which is assumed to be connected. 

This kind of setting is natural and many models in applied probability and 
combinatorics fit into this framework, including those discussed in Section [3l 

For each t G Z + , Xt may be viewed as a random variable on a measurable 
space (fi, T), where 

$7 = {uj = {ujq,uj\, . . .) : u>i G S Vi}, 

and T = cr(U^. J r t), with Tt = &{Xi : i < t). Then each is the i-co- 
ordinate projection, that is Xi(uj) = U{ for i G Z + . Then the cr-fields Tt 
form the natural filtration for the process. 

Let V{S) be the power set of S. The law of the Markov chain is a prob- 
ability measure P on (fl, J 7 ), and is determined uniquely by the transition 
matrix P together with a probability measure fi on (S,V(S)) that gives the 
law of the initial state Xq, according to 



uj : uj 



3 



Xj : j < i}) = fi({x })Y[P(xj,Xj+i), 



3=0 



for each x$, ... ,Xi G S, for each i G Z + . This gives the law of (Xt) con- 
ditional on C(Xq) = fj,, and will be denoted by in what follows. Let 
P t (x,y) be the t-step transition probability from x to y, given inductively 
by 

P\x,y) = ^2P t -\x,z)P(z,y). 

Then F^(X t G A) = (/iP*)(A) for ACS. 

Let denote the expectation operator corresponding to P^. For t G Z + 
and / : S — * R, define the function P*/ by 

(P t f)(x)=Y,P t (x,y)f(y), xGS. 

y 

In other words, (P t f)(x) = ~Es x [f(X t )] = (5 x P t )(f), the expected value of 
f(Xt) at time t conditional on the Markov process starting at x, i.e. the 
expectation of the function / with respect to measure 8 X P^ '■ In general, we 
write E^[f(X t )] = ( f ,P t )(f). 

A real-valued function / on S is said to be Lipschitz (or 1-Lipschtitz) if 

imi - S u D MzM <i 

II J Lip— SUp — r S 1. 

XJ L y d(x,y) 

Here, equivalently, we only need to consider vertices at distance 1, so / is 
Lipschitz if and only if sup^.^ ^ \f(x) - f{y)\ < 1. 
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Given a probability measure \x on (S,V(S)) and an S'-valued random 
variable X with law C{X) = fi, we say that /U or X has normal concentration 
if there exist constants C, c > such that, for every u > 0, uniformly over 
1-Lipschitz functions / : S — > R, 

M (|/(X) -/,(/)! >u)<Ce~- 2 . (2.1) 

We say that or X has exponential concentration if there exist constants 
C, c > such that, for every u > 0, uniformly over 1-Lipschitz functions 
/ : S -> R, 

M (|/(X)- M (/)|>u)<Ce- c ". (2.2) 
These definitions are closely related to the notions used by Ledoux 

In Section [J] we shall give conditions under which a discrete-time Markov 
chain (Xt) exhibits normal concentration of measure over long time intervals 
and in equilibrium. 

For probability measures Hi,f^2 on (S,¥(S)), the total variation distance 
between \i\ and fj-2 is given by 

d-rwinu^) = \yZ I Ml 0*0 - M2(ic)| = sup \n\{A) - [12{A)\. 

It is well known that the total variation distance satisfies 

dry (1^1,1^2) = inf vr(X / Y), 

7T 

where the infimum is over all couplings tt = £(X, Y) of S- valued random 
variables X,Y such that the marginals are C(X) = (xi and C(Y) = ^2- 

The Wasserstein distance between probability measures fii and ^2 is de- 
fined as 



d w (/ii,/x 2 ) = sup 
/ 



fdfr - / fd/j, 



sup \fix(f) ~ A*2(/)|, 
/ 



where the supremum is over all measurable 1-Lipschitz functions / : S — > R. 
By the Kantorovich - Rubinstein theorem (see 0], Section 11.8), 

d w = inf{vr[d(X, y)] : C(X) = m,C(Y) = /x 2 }, 

7T 

where the infimum is taken over all couplings tt on 5 x S with marginals 
H\ and /i2, and we write 7r[d(X,Y)] for the expectation of d(X, Y) under 
the coupling tt. It is well known that the Wasserstein distance metrises 
weak convergence in spaces of bounded diameter. Also, since the discrete 
space (S,V(S)) is necessarily complete and separable, so is the space of 
probability measures on (S,V(S)) metrised by the Wasserstein distance. 
See [11] for detailed discussions of various metrics on probability measures 
and relationships between them. 

3. Examples of rapid mixing and concentration 



In this section we give some examples of known Markov chains exhibiting 
both concentration of measure in equilibrium and rapid mixing. 
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3.1. Mean-field Ising model. Let G = (V,£) be a finite graph. Elements 
of the state space S := {—1,1} will be called configurations, and for a G S, 
the value cr(v) will be called the spin at v. The nearest-neighbour energy 
H(o~) of a configuration a G { — 1, 1}^ is defined by 

H{a):=- J(v,w)a(v)a(w), (3.1) 

v,w£V, 

where w ~ v means that {w, v} G 8. The parameters J(v, w) measure 
the interaction strength between vertices; we will always take J(v, w) = J, 
where J is a positive constant. 

For (3 > 0, the Ising model on the graph G with parameter (5 is the 
probability measure tt on S given by 

-/3H(a) 

*M = -^p (3.2) 

where = X^o-gh e~@ H ^ is a normalising constant. 

The parameter (3 is interpreted physically as the inverse of temperature, 
and measures the influence of the energy function H on the probability 
distribution. At infinite temperature, corresponding to j3 = 0, the measure 
tt is uniform over S and the random variables {a(v)} v ^v are independent. 

The (single-site) Glauber dynamics for it is the Markov chain (Xf) on S 
with transitions as follows. When at a, a vertex v is chosen uniformly at 
random from V, and a new configuration is generated from tt conditioned 
on the set 

{77 G S : r/(u>) = o-(w), w 7^ u}. 

In other words, if vertex u is selected, the new configuration will agree with 
a everywhere except possibly at v, and at v the spin is +1 with probability 

where M v (ct) := -/Ewio~« ff H' Evidently, the distribution of the new 
spin at v depends only on the current spins at the neighbours of v. It is 
easily seen that (Xt) is reversible with respect to the measure tt in (|3.2p . 
which is thus its stationary measure. 

Given a sequence G n = (V n , E n ) of graphs, write TT n for the Ising measure 
and (X^) for the Glauber dynamics on G n . For a given configuration 

a G S n , let C{x[ n \a) denote the law of starting from a. The worst- 
case distance to stationarity of the Glauber dynamics chain after t steps 
is 

d n (t) := maxd TY (C(X^\a),TT n ). (3.4) 

aeS n 

The mixing time t m i x (n) is defined as 

UW := min{t : d n (t) < 1/4}. (3.5) 



6 



MALWINA J. LUCZAK 



Note that t m i x (n) is finite for each fixed n since, by the convergence theorem 
for ergodic Markov chains, d n (t) — > as t — > oo. Nevertheless, t m i x (n) will 
in general tend to infinity with n. It is natural to ask about the growth rate 
of the sequence t m i x (n). 

Definition 1. The Glauber dynamics is said to exhibit a cut-off at {t n } 
with window size {w n } if w n = o(t n ) and 

lim liminf d n (t n — 7^ n ) = 1, 

7^00 n— >oo 

lim limsupd n (t n + ^w n ) = 0. 

Informally, a cut-off is a sharp threshold for mixing. For background on 
mixing times and cut-off, see [211 ] . 

Here we consider the mean-field case, taking G n to be K n , the complete 
graph on n vertices. That is, the vertex set is V n = {1,2, ... ,n}, and the 
edge set 6 n contains all pairs {i,j} for 1 < i < j < n. We take the 
interaction parameter J to be 1/n; in this case, the Ising measure ir on 
{ — 1, l} n is given by 




tt(o-) =TT n (a) = ^y ex P I - / . "V'Mj) 1 ■ (•■*■'>) 

In the physics literature, this is usually referred to as the Curie- Weiss model. 
To put this into the framework introduced in Section [21 the state space S 
consists of all n-vectors with components taking values in {—1, 1}, and two 
vectors are adjacent if they differ in exactly one co-ordinate. 

It is a consequence of the Dobrushin-Shlosman uniqueness criterion that 
t m i-x(n) = O(nlogn) when /3 < 1; see (See also [3; EH). We shall see 
in Section H] that, in the same regime, the stationary measure n (the Gibbs 
measure) exhibits normal concentration of measure for Lipschitz functions 
in the following sense. Let be a stationary version of x[ n \ Then, for 
some constants c, C > 0, for all u > 0, 

P ff (|/(XW) - EM(X (n) )) >u)< Ce~ u2 l cn , (3.7) 

uniformly over all 1-Lipschitz functions on 5 and over all n. Thinking 
about (13. 7p simply as a statement about the measure ir without any mention 



(n) 

of the process X t , we can also rewrite it in the form 



7T 



({a:\f(a)-7r(f)\>u})<Ce' u2 / cn . 



Inequality (13. 7h will follow from Theorem 14.11 (i) , and is an improvement on 
Proposition 2.7 in [HI]. 

More precise results about the speed of mixing for (3 < 1 can be found 
in [HI], where the occurrence of a cut-off is established. The following is 
Theorem 1 from \ll 
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Theorem 3.1. Suppose that < 1. The Glauber dynamics for the Ising 
model on K n has a cut-off at t n = [2(1 — f3)]~ 1 nlogn with window size n. 

It is also easy to show, using the concentration of the Gibbs measure and 
the method used to prove Theorem 1.4 in [19J, that asymptotically the spin 
values in a bounded set of vertices become almost independent. (In the 



language of [3l( _ see also references therein - this corresponds to the decay 
of correlations or spatial mixing.) 

On the other hand, in the case /3 > 1, there is no rapid mixing, and no 
cut-off (see (l5l ; 0] and references therein): i m i x (n) is of the order ra 3//2 when 
(3 = 1 and is exponential in n when (5 > 1. For the same range of /3, the 
Gibbs measure fails to exhibit normal concentration. 

In particular, consider the function m : S — ► R given by m(<r) = Y17=i <T (0) 
the magnetisation; it is easy to see that is 1-Lipschitz, and E„-(m(X)) = 
7r(m) = 0. However, when (3 > 1, then there is a constant c > such that 

7r({<7 : m(a) > cn}) = tt({o~ : m(a) < — era}) > 1/4, 

i.e. rai(X) is bi-modal for (3 > 1. While there is no bi-modality in the case 
/? = 1, it is easy to calculate directly that m(X) is not concentrated in 
the sense of (|3,7p . Further, for j3 > 1, the spins of vertices are no longer 
approximately independent for large n. 



3.2. Supermarket model. Consider the following well-known queueing 
model with n separate queues, each with a single server. Customers ar- 
rive into the system in a Poisson process at rate Ara, where < A < 1 
is a constant. Upon arrival each customer chooses d queues uniformly at 
random with replacement, and joins a shortest queue amongst those chosen 
(where she breaks ties by choosing the first of the shortest queues in the 
list of d). Here d is a fixed positive integer. Customers are served accord- 
ing to the first-come first-served discipline. Service times are independent 
exponentially distributed random variables with mean 1. 

A number of authors have studied this model, as well as its extension to 
a Jackson network setting [13; [HI; E; E3; ElEl; S3; S3; S3] . 

For instance, it is shown by Graham in [10[] that the system is chaotic, 
provided that it starts close to a suitable deterministic initial state, or is 
in equilibrium. This means that the paths of members of any fixed finite 
subset of queues are asymptotically independent of one another, uniformly 
on bounded time intervals. This result implies a law of large numbers for 
the time evolution of the proportion of queues of different lengths, that is, 
for the empirical measure on path space [10]. In particular, for each fixed 
positive integer ko, as ra tends to infinity the proportion of queues with length 
at least ko converges weakly (when the infinite-dimensional state space is 
endowed with the product topology) to a function Vf(ko), where w*(0) = 1 
for all t > and (v t {k) : k £ N) is the unique solution to the system of 
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differential equations 

^1 = \( Vt (k - l) d - v t {k) d ) - (v t (k) - v t (k + 1)) (3.8) 

for k £ N. Here one needs to assume appropriate initial conditions (vo(k) : 
k G N) such that 1 > vq(1) > vq(2) > • • • > 0. Further, again for a fixed 
positive integer k^, as n tends to infinity, in the equilibrium distribution this 
proportion converges in probability to A 1+dH ^ d , and thus the probabil- 
ity that a given queue has length at least ko also converges to X 1+d ^ ^ d . 

Although the above results refer only to fixed queue length k$ and bounded 
time intervals, they suggest that when d > 2, in equilibrium the maximum 
queue length may usually be O(loglogn). Indeed, one of the contributions 



of 18] is to show that this is indeed the case, and to give precise results on 
the behaviour of the maximum queue length. In particular, it turns out that 
when d > 2, with probability tending to 1 as n — > oo, in the equilibrium 
distribution the maximum queue length takes at most two values; and these 
values are loglogn/logd-fO(l). Along the way, it is also shown in [l8j] that 
the system is rapidly mixing, that is the distribution settles down quickly 
to the equilibrium distribution. In this context, 'quickly' will mean 'in time 
O(logn), as this is a continuous time process with events happening at 
rate n, and so O(logn) corresponds to Q(n logra) steps of the discrete-time 
jump chain. It is further established in [18|] that the equilibrium measure is 
strongly concentrated. 

Another natural question concerns fluctuations when in the equilibrium 
distribution: how long does it take to see large deviations of the maximum 
queue length from its stationary median? An answer is provided in [3] 
by establishing strong concentration estimates (for Lipschitz functions of 
the queue len gth s vector) over time intervals of length polynomial in n. The 
techniques in [l8| are partly combinatorial, and are used also in flTj l and [l^] . 
In particular, in [ijj, the concentration estimates obtained in [18| are used 
to establish quantitative results on the convergence of the distribution of a 
queue length and on 'propagation of chaos'. 

Let us start by discussing the rapid mixing results known for the super- 
market model. In |18] two rapid mixing results are established, one in terms 
of the Wasserstein distance and one in terms of the total variation distance. 
Unlike for the Ising model in Section T3. 11 it turns out to be inappropriate to 
be looking at the worst-case mixing time, that is the supremum of the mix- 
ing times over all possible starting states. In the present case, this quantity 
is unbounded: the state space is unbounded, and the time to equilibrium 
from states x with the total number of customers || x ||i= k 3> n is of the 
order at least k. Then the best one can do is to obtain good upper bounds 
on the mixing time for copies of the Markov chain starting from nice states 
- that is, states where the queues are not too 'over-loaded'. This is made 
more precise below. 
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Let X^ or X t be the queue-lengths vector (x[ n \l), . . . ,xj: n \n)) in the 
supermarket model with n servers. For a positive integer n, {X^ ) is an 
ergodic continuous-time Markov chain, with a unique distribution tt^ or it. 

For any given state x write C(xj: n \x) to denote the law of x[ n ^ given 

X^ = x. Also, for e > 0, the mixing time T^ n \e,x) starting from x us 
defined by 

r (n) (e,x) = mi{t > : d TY {C{X^\ x), tt^) < e}. 

The result below, Theorem 1.1 in [l^ |. shows that starting from an initial 
state in which the queues are not too long, the mixing time is small. In 
particular, if e > is fixed and denotes the all-zero n- vector, then r^ n \e, 0) 
is 0(log n). 

Theorem 3.2. Let < A < 1 and let d be a fixed positive integer. For 
each constant c > there exists a constant n > such that the following 
holds for each positive integer n. Consider any distribution of the initial 

(n) 

queue-lengths vector Xq , and for each time i > let 

S n>t = F(\X^ n) \ > cn) + F(M^ n) > V t). 

Then 

d T v(£(^ (n) ),7r (n) ) < ne"* + 2e-" n + 6 n , t . 
The O(logn) upper bound on the mixing time r is of the right order. 



Indeed, it is also proven in [18f] that, for a suitable constant 9 > 0, if t < 
9 log n then 

d T v(£(^ (n) ),7r (n) ) = 1 - e- n{log2n \ (3.9) 

Thus r( n )(e,0) is 6 (log n) as long as both e 1 and (1 — e) 1 are bounded 
polynomially in n. 

It would be interesting to consider the mixing times more precisely, to 
establish whether the supermarket model exhibits a cut-off. Again, here 
we should not be considering the worst-case mixing time, but rather the 
worst case over a subset of 'good' initial states, which are states where the 
total number of customers is not too large and the maximum queue not 
too long. Also, to bring the supermarket model into the discrete framework 
of Section [21 let us consider the jump chain of the supermarket model. We 

A ( n \ 

shall denote the jump chain by A t or Xt in what follows, and its stationary 
measure by or tx. 

The transition probabilities of the jump chain are as follows. Given the 
state at time t is x, the next event is an arrival with probability A/(A + 1) 
and is a potential departure with probability 1/(A + 1). Here 'potential' 
means that it may be a departure or no change of state at all. Given that 
the next event is an arrival, the queue to which the new customer is sent is 
determined by selecting a uniformly random d-tuple of queues and directing 
the customer to a shortest queue among those chosen, in the same way as 



10 



MALWINA J. LUCZAK 



for the continuous-time process. Given that the next event is a potential 
departure, the departure queue is chosen uniformly at random from among 
all n queues. Then a customer will depart if the selected queue is non- 
empty; otherwise, nothing happens. It is easy to adapt the proofs in [lH 
(where the arguments are, in fact, based on analysing the jump chain) to 
show that Theorem 13.21 implies mixing in time of the order 0(n log n) from 
initial states x such that || x ||i= 0(n) and || x ||oo= O(logn). 
Accordingly, we make the following conjecture: 

(n) 

Conjecture 3.3. Let c be a positive constant, and let Sq be the set of 
all queue-lengths vectors x in the n server supermarket model such that 
II x ||i< cn and \\ x \\oc< clogn. Let e > 0, and let 



Then d n (e,t) has a cut-off in the sense of Definition^ with window size n. 

Our conjecture appears supported by some simulation results. Also it is 
supported by Conjecture 1 from [ljj, which states that the Glauber dynam- 
ics for the Ising model on transitive graphs G n has a cutoff if the mixing 
time is 0(n log n). The jump chain of the supermarket process is of a similar 
type to Glauber dynamics in that it makes only local transitions, and has 
mixing time of the order 0{n log n), starting from good initial states. Also, 
it has a lot of symmetry - its stationary distribution is exchangeable. Thus 
the supermarket chain appears a good candidate for cut-off, though proving 
it may not be easy. 

More generally, perhaps cut-off can be proven to be a phenomenon that 
also co-occurs with rapid mixing and concentration of measure in equilibrium 
much more widely, in the context of Markov chains whose jumps are suitably 
local. 

In [3], the authors upper bound mixing in terms of the total variation 
distance by first upper bounding the Wasserstein distance between the dis- 
tribution of the process at time t and the stationary distribution. The 
following result is Lemma 2.1 in [la ]. 

Theorem 3.4. Let < A < 1 and let d be a fixed positive integer. For each 
constant c > there exists a constant n > such that the following holds 
for each positive integer n. Let M denote the stationary maximum queue 
length. Consider any distribution of the initial queue-lengths vector Xq such 
that |Aq| has finite mean. For each time t > let 



d n (e,t)= sup d TV (£(xi n \x),n^). 




2E[|A |l| Xo | >cn ]+2cn P(M > V t). 



Then 



d w {£.(X t ),ir) < ne~ vt + 2cnP 7r (M > nt) + 2e~ vn + S njt . 
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The upper bounds on the Wasserstein and total variation distance, and 
thus on the mixing time, are proven in [l8| by means of a monotone cou- 
pling. The coupling takes two copies of the queueing process starting in 
adjacent states (that is, states differing in one customer in one queue) and 
couples their paths together in such a way that the £i-distance between 
them is non-increasing (and so always stays equal to 1 until the processes 
coalesce). Furthermore, the coupling is such that with high probability the 
£i-distance rapidly becomes 0. The coupling is then extended to all pairs of 
starting states with not too many customers in queues using the fact that 
the Wasserstein distance is a metric on the space of probability measures, 
or a path-coupling argument 0]. 



The property that the £i-distance is non-increasing in the coupling in 18|] 
is very strong and not commonly encountered in path-coupling scenarios. 
This property is exploited in [18] to prove strong concentration of measure 
for the supermarket process, starting from a fixed (or highly concentrated 
state) for a long time interval. The following is Lemma 4.3 in 

Lemma 3.5. There is a constant c > such that the following holds. Let 
n > 2 be an integer and let f be a 1-Lipschitz function on the state space (set 
of all queue lengths vectors) S. Let also xq £ S and assume that the queue- 
lengths process (X t ) satisfies Xq = xo a.s. Let fi t = [f(Xt)]. Then for 
all times t > and all u > 0, 

Vs X0 (\f(Xt) - tH\ >u) <ne~^. (3.10) 

Lemma 4.3 in fl8| is proven by observing that the supermarket process 
can be 'simulated' by two independent Poisson processes, the arrivals pro- 
cess (with rate An) and the (potential) departure process (with rate n), 
together with corresponding independent choices of queues (d independent 
uniformly random choices for each event in the arrivals process, and one 
uniformly random choice in the departures process). One then conditions 
on the number of events in the interval [0,t], and then the state at time t 
is conditionally determined by a finite family of independent random vari- 
ables. In other words, the argument is, just like most of the other arguments 
in [3], based on studying the jump chain (Xt), although this is not made 
explicit therein. 

The non-increasing distance coupling property is used to show that a 
Lipschitz function of the queue lengths vector must satisfy a bounded dif- 
ferences condition, so that the discrete bounded differences inequality can 
be applied to show concentration of measure for Lipschitz functions in the 
conditional space. The proof is then completed by deconditioning. 

The rapid mixing result can be combined with the long-term concentra- 
tion of measure result to prove concentration of measure in equilibrium for 
Lipschitz functions of the queue-lengths vector. The following is Lemma 4.1 
in [IF 
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Lemma 3.6. There is a constant c > such that the following holds. Let 
n > 2 be an integer and consider the n-queue system. Let the queue-lengths 
vector Y have the equilibrium distribution. Let f be a 1-Lipschitz function 
on S. Then for each u > 

¥ 7T (\f(Y)-E n [f(Y)]\>u)<ne- cu / ni . (3.11) 

Lemmas 13.51 and 13.61 prove strong concentration of measure - normal con- 
centration for small deviations and exponential concentration for larger de- 
viations in the case of starting from a fixed state, and exponential concen- 
tration in equilibrium. The factor n in the bound on the right-hand sides 
of both (|3.10p and (|3,lip is a limitation of the technique and not the right 
answer. It is natural to expect the truth to be a lot better - that it can be 
replaced by a constant. In Section [5] we develop concentration inequalities 
that achieve that. Although we work with the discrete-time jump chain, it 
is easy to see that our results apply also to the continuous time chain. One 
further advantage of our inequalities is that they apply to other settings - 
for instance where rapid mixing is established by a coupling, but the cou- 
pling does not have additional useful properties such as the non-increasing 
Wasserstein distance. 

Even so Lemmas 13.51 and 13.61 are quite powerful. We now explore, briefly, 
some results concerning the queue lengths in the supermarket model in equi- 
librium that can be obtained using Lemma [3.61 The following is Lemma 4.2 
in [l||. (We drop the subscript tt to lighten up the notation.) 

Lemma 3.7. Consider the n-queue system, and let the queue-lengths vector 
Y have the equilibrium distribution. For each non-negative integer k, let 
£(k,y) denote the number of queues of length at least k in state y. Also, for 
each non-negative integer k, let £{k) = M[£(k,Y)]. Then for any constant 
c > 0, 

p(sup|i(fc,y) -£(k)\ > cnhog 2 ™) = e -^( lo g 2 ™). 

k 

Also, there exists a constant c > such that 

su.pV(\£(k,Y) -£(k)\ > cnhogn) = o(l). 
fc 

Furthermore, for each integer r > 2 

sup | E[£(k, Y) r ] - £(k) r \ = 0{n r - 1 log 2 n). 

k 

Lemma 5.1 in stated below, yields further precise information about 
the equilibrium behaviour, over long time intervals. 

Lemma 3.8. Let K > be an arbitrary constant and let r = n . Let (Yt) 
be in equilibrium and let c > be a constant. Let B r be the event that for 
all times t with < t < r 

sup \£{i, Y t ) - n \^+d+-+d i - 1 1 < cn l/2 log 2 n _ 

i 
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Then F(B T ) < e 



-H(log 2 n) 



In [18|], Lemma 5.1 is used to prove two-point concentration for the sta- 
tionary maximum queue length and its concentration on only a constant 
number of values over long time intervals. This is Theorem 1.3 in (l8l |: 

Theorem 3.9. Let < A < 1 and let d > 2 be an integer. Then there exists 
an integer-valued function irid = m^n) = log log n/log d+0{\) such that the 
following holds. For each positive integer n, suppose that the queue-lengths 
vector is in the stationary distribution (and thus so is the maximum 

queue length ). Then for each time t > 0, is m,d(n) or m,d(n) — 1 

with probability tending to 1 as n — > oo; and further, for any constant K > 
there exists c = c(K) such that, with probability tending to 1 as n — > oo, 

max \Mf — loglogn/logdl < c. (3.12) 

0<t<n K 



The functions ni2(n), m^(n), ... may be defined as follows. For d = 

2,3, .. . let id{ri) be the least integer i such that A < n~2 log n. Then 
we let m2 (n) = ^(ti) + 1, and for d > 3 let md(n) = id(n). (As we have 
seen, with high probability the proportion of queues of length at least i is 



close to A d -! . 



Also, equation (37) in [18f] shows that, for r = O(logn), 

P(M > m d {n) +r)< e - crlogn , (3.13) 

for a constant c > 0. 



In 19|, strong concentration of measure results from [181 ] are used to 
show that in equilibrium the distribution of a typical queue length converges 
to an explicit limiting distribution and provide explicit convergence rates. 
Let y( ra )(l) denote the equilibrium length of of queue 1. (Note that the 
equilibrium distribution is exchangeable.) The following is Theorem 1.1 
in [ljj. Let C\,d denote the law of a random variable Y such that P(Y > 
k) = v(k), where v(k) = \( dk ~ i y( d ~ 1 ) for each k = 0,1,.... Note that 
p(y(«)(l) > i) = A = v(l). 

Theorem 3.10. For each positive integer n let Y~( n ) be a queue-lengths n- 
vector in equilibrium, and consider the length y( n )(l) of queue 1. Then 

d TV (£(Y^(l)),£ X4 ) 

is of order n" 1 up to logarithmic factors. 



In fact, it is proven in [19] that the above total variation distance is 
o(n~ 1 log 3 n) and is 0(n _1 ). Also, the following holds (Corollary 1.2 in [ijj). 

Corollary 3.11. For each positive integer k, the difference between the 
kth moment E,[Y^ n \l) k ] and the kth moment of C\ t d is of order n^ 1 up 
to logarithmic factors. 
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The above results concern the distribution of a single queue length. One 
may also consider collections of queues and chaoticity. The terms 'chaotic- 



ity' and 'propagation of chaos' come from statistical physics [131 ] . and the 
original motivation was the evolution of particles in physical systems. The 
subject has since then received considerable attention, especially following 
the ground-breaking work of Sznitma n [2a| . 

The result below (Theorem 1.4 in [19|]) establishes chaoticity for the su- 
permarket model in equilibrium. We see that for fixed r the total variation 
distance between the joint law of r queue lengths and the product law is at 
most 0(n _1 ), up to logarithmic factors. More precisely and more generally 
we have: 

Theorem 3.12. For each positive integer n, let Y^ n ' be a queue-lengths 
n-vector in equilibrium. Then, uniformly over all positive integers r < n, 
the total variation distance between the joint law ofY^ n \\), . . . ,Y^ n '(r) and 
the product law £(Y( n \l))® r is at most 0(n _1 log 2 n(2 log log n) r ); and the 
total variation distance between the joint law ofY^ n \l), . . . ,Y^ n \r) and the 
limiting product law Cf r d is at most O^" 1 log 2 n(21oglogn) r+1 ). 



Analogous time-dependent results (away from equilibrium) are also given 



m 



19l | - proven using Lemma 13.51 above (Lemma 4.3 in [18]) but we omit 
them here for the sake of brevity. Let us mention that the arguments used 
in to prove Theorems 1.1 and 1.4 (Theorems 13.101 and 13.121 above) are 
quite generic and would apply in many other settings. The main property 
needed is concentration of measure for Lipschitz functions of the state vector, 
the polynomial form of the generator of the Markov process, and, in the case 
of Theorem 1.1, also the exchangeability of the stationary distribution. The 
chaoticity result Theorem 13.121 above is a quantitative version of some of the 
results in |2£ 



To conclude this section, we mention that analogues of results in [18l : Il9l ] 



are proved in [17( for a related balls-and-bins model, where, instead of queue- 
ing up to receive service on a first-come first-served basis, customers (balls) 
have independent exponentially distributed 'lifetimes' and each departs its 
queue (bin) as soon as its lifetime has expired. 



Current work in progress [9J includes extensions of the results in [18|; LL9|] 
to the supermarket model where the number of choices d = d(n) and the 
arrival rate A = A(n) are n-dependent, including the interesting case where 
d — > oo and A — ► 1 with various functional dependencies between A and d. 



4. Coupling and bounded differences method generalised 

This section contains our main results and applications. We use the no- 
tation introduced in Section [2J 

Let us state our first theorem, which gives concentration of measure for 
Lipschitz functions of a discrete-time Markov chain on state space S and 
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with transition matrix P at time t, under assumptions on the Wasserstein 
distance between its i step transition measures for i < t. 

Theorem 4.1. Let P be the transition matrix of a discrete-time Markov 
chain with discrete state space S. 

(i) Let (oti : i £ N) be a sequence of positive constants such that, for all i, 

sup dw^P^SyP^Kai. (4.1) 

x,y£S:d(x,y)=l 

Let f be a 1-Lipschitz function. Then for all u > 0, xq £ S, and t > 0, 

F Sxo (\f(X t ) - E Sxo [f(X t )}\ >u)< 2e-« 2 / 2 E-< (4.2) 

(ii) More generally, let So be a non-empty subset of S, and let (a^ : i £ N) 
be a sequence of positive constants such that, for all i, 

sup dwiSxP^SyPt) < a % . (4.3) 

x,y&S -d(x,y)=l 

Let 

= l x £ So : y £ Sq whenever d(x,y) = 1}. 
Let f be a 1-Lipschitz function. Then for all xq £ Sq, u > and t > 0, 

Pfco ({\f(X t )-E Sxo [f(x t ))\ > u}n{x s £ Sq : o< s <t})< 2e -« 2 /2(EU< 

(4.4) 

If the Markov chain becomes contractive after a finite number of steps, 
then one can deduce from Theorem 14.11 concentration results for the station- 
ary measure of the Markov chain, as in the following corollary. 

Corollary 4.2. (i) Suppose that there exists x £ S and a sequence cti : S —* 
R" 1 " of functions such that, for all y £ S, 

d^xP^SyP 1 )^^), (4.5) 

where oti(y) — > as i — > oo for each y, and 

supE^ [cKt(-Xfc)] = sup(P fc «j)(x) ^ as i — > oo. (4.6) 

k ' k 

Then (Xt) has a unique stationary measure it and, for all y £ S, S y P l — > it 
as t — ► oo. 

(ii) Suppose that holds, and the constants cti in Theorem \4-l\ satisfy 
Y^i ctj < oo. Suppose further there exists x £ S such that 

sup(P k g)(x) < oo, 

k 

where g{y) = d(x,y). Then (Xt) has a unique stationary measure it, and 
6 y P l — ► 7r as t — > oo for each y. 

Furthermore, let X be a stationary copy of Xt. Then, for all u > 0, and 
uniformly over all 1-Lipschitz functions f, 

P 7T (\f(X)-K K [f(X)}\ > 2u) < 2e-" 2 / 2 ^= 1 «?). (4.7) 
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(iii) Suppose that (Xt) has a unique stationary measure tt and condi- 
tion H4.3\) holds, where ^ a i < 00 • Let x S Sq, and suppose 5 > and 
to > are such that d-w(5 x P to , ir) < 6 and 

Fs x (X t GSUort<t )>l-5. 

Let X be a stationary copy of X t . Then, for all u > 5, uniformly over all 
1-Lipschitz functions f , 

P.d/PO - E„[f{X)]\ > 2u) < 2 e -" 2 /2(E^ 1 af) + 26 (4 _ 8) 

Proof, (i) Consider the sequence Pi of measures on (S,V(S)) given by Pi = 
8 x P l ; we have, using the coupling characterisation of the Wasserstein dis- 
tance, 

d w (P h P i+k ) = d^(8 x P i ,(8 x P k )P^)<Y d (^P k )(y)dm(8 x P\8 y P i ) 

y&S 

< ^(4P fc )(2/)ai(y) < supE 5 >,(X fe )] - 
yes k 

as i — * 00, by assumption. Thus the sequence (Pi) is a Cauchy sequence 
and so, since the space of probability measures on (S,V(S)) is complete 
with respect to the Wasserstein distance, it must converge to a probability 
measure n on (S,V(S)). It is obvious that this measure must be stationary 
for P. 

Now, take y G S, and let Qi = 8 y P l . Then 

dw(Pi, Qi) = d w (5 x P\ 5 y P l ) < ai(y) ^0 as % -> 00. 

It follows that Qj — > tt as i — > 00, and so 7r must be the unique stationary 
measure. 

(ii) The assumption that Yli a i < 00 implies that «j — > as i — > cxd. 
Then it is easily seen (using the fact that the distance d(y, z) between each 
pair y, z of states in finite) that conditions (14.50 and (|4.6p of part (i) hold for 
x, with a«(y) < aid(x,y), and so, as in (i) one can prove that there exists 
a (necessarily unique) stationary measure n, and that 8 X P — > 7r as t — > 00 
for each x G 5. 

Let us now prove the concentration of measure result, inequality (|4.7|) . 
Take some x S 5. Given e > 0, for t large enough the Wasserstein distance, 
and hence the total variation distance, between 8 x P t and tt is at most e. 
Then, for u > e and all such t, by Theorem 14.11 part (i), 

P 7r (|/(X)-E 7r [/(X)]|>2u) < P 5x (\f(X t )-E Sx [f(X t )]\>u)+e 

< 2e- u2 /2(E-^f) +£ . 

Here we have used the fact that 

\E 7T [f(X)]-E 5x [f(X t )}\<e<u. 
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Since e is arbitrary, the result follows, 
(iii) Let 

A to = {u:X t (uj)eS o Vte[0, to]}. 
Arguing as in (ii), and using Theorem 14, II part (ii), we can write, for u > 5, 

P v (\f(X)-E n [f(X)]\>2u) < F Sx (\f(X t0 )-Es x [f(X t0 )]\>u) + 5 

< F Sx ({\f(X t0 )-E Sx [f(X t0 )}\ >u}nA t0 ) 
+ 25 

< 2e- u2 / 2 ^^ + 25, 

as required. 

□ 

To prove Theorem 4.1, we shall make use of a concentration inequality 
from [26J]. Let (&,J r , P) be a probability space, with il finite. Let Q C T be 
a cr-field. Given a bounded random variable Z on (Cl, J 7 , P), the supremum 
oi Z m.Q is the ^-measurable function given by 

sup(Z|t/)(u)) = min maxZ(w'). (4.9) 

Thus sup(Z) takes the value at Q equal to the maximum value of Z over 
the 'smallest' event in Q containing u>. Since f2 is finite, we are assured that 
the smallest event containing uj does exist; the arguments used here would 
work also in many cases where Vt is countably infinite. 

The conditional range of Z in Q, denoted by ran(Z), is the (/-measurable 
function 

ran(Z | Q) = sup(Z|C?) + sup(-Z|6!). (4.10) 
Let {0, Cl} = C f\ C . . . be a filtration in jF, and let Zq, . . . , be the 
martingale obtained by setting Zt = E(Z\J-t) for each t. For each t let ran^ 
denote ran(Z t |^_i); by definition, ran t is an ^_i-measurable function. For 
each t, let the sum of squared conditional ranges R 2 be the random variable 
Si=i ran i> an d let the maximum sum of squared conditional ranges f| be 
the supremum of the random variable R 2 , that is 

rf = sup.R 2 (w). 

The following result is Theorem 3.14 in [26]. 

Lemma 4.3. Let Z be a bounded random variable on a probability space 
(f2,.F,P) with E(Z) = m. Let {0, &} = Tq C T\ C . . . C f t be a filtration 
in T . Then for any u > ; 

F{\Z-m\ >u)< 2e" 2n2 / f ?. 

More generally, for any u > and any value r 2 , 

F({\Z - m\ > u} n {R 2 t < r 2 }) < 2e~ 2u2/r K 
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Proof of Theorem 14.11 Let / : S — > M be 1-Lipscbitz. Fix a time i 6 N, 
xo £ S and consider the evolution of Xt conditional on Xo = xq for t steps, 
that is until time t. Since we have assumed that there are only a finite 
number of possible transitions from any given x E S, we can build this 
conditional process until time t on a finite probability space (0, J 7 , Ps x )'■ we 

can take f2 to be the finite set of all possible paths of the process starting 
at time in state xq until time t, and T to be the power set of £1. 

In the conditional space, for each time j = 0, . . . , t, let Tj = ct(Xq, ... ,Xj), 
the (T-field generated by Xo, . . . , X,-; so = {0, £1} and Tt = T . We write 
E instead of E in what follows to lighten the notation. 

Consider the random variable Z = f(Xt) : O — > R. Also, for j = 0, . . . , t 
let Zj be given by 

Z j = E[f(X t )\F j ]=E Sxo [f(X t )\X ,...,X j ] = (P^'/X^-), 

where we have used the Markov property in the last equality. 

Fix 1 < j < t; we want to upper bound ran,- = ran(Zj \ J-j-\). Fix also 
si, ... , Xj-i £ S, and for x £ S consider 

g(x) = E[f{X t )\X j = x]=E[f(X t „ j )\X = x] 
= (**-'/)(*). 

Note that Zj{(v) £ {<7(;e) : d(x,Xj^i) < 1} for a) such that = 
It follows that, for such u), 

ran,(u)) = sup \g(x) - g(y)\. 

x,y:d(x,Xj—-i)<l,d(y,Xj—i)<l 

Let us prove part (i) of the theorem. As / is 1-Lipschitz, 

sup \g(x)-g(y)\ = sup |(P^/)(x) - (P^/)(y)| 

x,y:d(x,y)<2 x,y:d(x,y)<2 

sup \E SxPt - 3 (f)-E SyPt ^(f)\ 

x,y:d(x,y)<2 

< 2 sup lEfcpt-iC/J-E^pt-iC/)! 

< 2 sup d w {5 x P t - j ,5yP t - j ) 

x,y:d(x,y)<l 

< 2at-j, 

by assumption. We deduce that ran^cD) < 2at-j for all a) G SI. It follows 
that 



t-i 



rt{uj) < 4^a?_ r , 

r=0 

uniformly over a) 6 fL Part (i) of Theorem 14. II now follows from Lemma f4.31 
To prove (ii), observe that the bound 

ran,(u)) = ran(Zj | !Fj-x){Cj) < 2a.t-j 

still holds on the event A t = {u : Xj(oj) £ S$ for j = 0, . . . , t}. I 
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The following special case of model satisfying the hypotheses of Theo- 
rem 14.11 is of particular interest and has received considerable attention in 
computer science literature; see for instance @; B 12]. Suppose (|4.ip is 
satisfied with o.{ = a 1 , where < a < 1 is a constant. In the language 
of [3] this corresponds to the following situation. Consider different copies 
(Xt), (X[) of the process with initial states x, x 1 respectively, that is Xq = x 
and Xq = x' almost surely. Suppose that we can couple (X t ), (X' t ) so that, 
uniformly over all pairs of states x,x' G S with d(x,x') = 1, 

E[d(Xi,X()|X = x,X = x'} < a, 

for a constant < a < 1. Thus, under the coupling, (X t ), (X' t ) will be 
getting closer and closer together on average as t gets larger, which im- 
plies strong mixing properties @;[l2T]- Then, uniformly over x,x' G S with 
d(x,x') = 1, d w (5 x P,5 xl P) < a. By 'path coupling' @;[l2] 

E[d(X 1 ,X' 1 )\X = x,X = x'} < ad(x,x'), 

and hence dy/(S x P, 5 X /P) < ad-w(5 x ,5 x >) for all pairs x, x' £ S. By induction 
on i, 

d w (5 x P t ,5 x/ P t ) < otd{x^) 
for all x, x' G S and all t G N. Then, in the same notation as earlier, we can 
upper bound 



f 2 <4]Ta 2r <4a 2 (l 



a 2 ) 



no (\f(X t )-E SxQ [f(X t ))\ > U ) < 2e-« 2 ( 1 -« 2 )/ 2 " 2 (4.12) 



for all t. Hence we obtain the following corollary. 

Corollary 4.4. Suppose that there is a constant < a < 1 such that 

d w (5 x P,5 x ,P) <a (4.11) 
for all x,x' G S such that d(x, x') = 1. Then for all t > 

for all u > 0, all xq G S, and for every 1-Lipschitz function on S. 

Hence, if X has the equilibrium distribution -k then, for all u > and 
every 1-Lipschitz function f , 

K{\f{X) - ^[f(X)]\ >u)< 2e -« 2 (i-" 2 )A* 2 (4.13) 

The particular choice of a = 1 — ci/n for a constant c\ > corresponds 
to the 'optimal' mixing time 0(n log n) for a Markov chain in a system with 
size measure n, and gives concentration of measure in equilibrium of the 
form 

K(\f(*t) - K[f(X t )}\ >u)< 2e-" 2 / C2n , (4.14) 
where C2 > is a constant. This is the case, for example, for the subcritical 
(P < 1) mean-field Ising model discussed in Section [3]- see for example [ll| 
or [lH for a description of the coupling that implies fast decay of the Wasser- 
stein distance. The same also applies to the Glauber dynamics for colourings 
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on bounded-degree graphs analysed in 0] (see also [1] and [13] )• The ap- 
plication is straightforward when the number of colours k is greater than 
2D, where D is the maximum degree of the graph. It is only a little more 
involved in the case (2 — n)D < k < 2D, where the proof in [7( relies on 
delayed path-coupling [3], whereby a new Markov chain is used with one step 
corresponding to cn steps of the original one, n being the size of the graph 
to colour. 

On the other hand a = 1 — 6/ (n 3 — n) for the Glauber dynamics on linear 
extensions of a partial order of size n (3; H3] gives an upper bound 0(n 3 log n) 
on mixing. The corresponding bound on deviations of a 1-Lipschitz function 
from its mean of size u is of the form 2e~ u ' cn , which is useless. However, 
one cannot do much better in general. To see this, consider the partial order 
on n points consisting of a chain of length n — 1 and a single incomparable 
element. It is not hard to check that in this case the mixing time is of the 
order n 3 - see 0j for details. It is also easy to see that there is no normal 
concentration of measure in the sense of (I4.14p . 

We shall now apply Theorem 14.11 and Corollary 14.21 to the supermarket 
process described in Section [372], or rather to the corresponding discrete-time 
jump chain Xf. Recall that, when in state x, the next event is an arrival 
with probability A/(l + A), and is a potential departure with probability 
1/(1 + A). Given that the next event is an arrival, the queue to which the 
arrival will go is determined by selecting a uniformly random <i-tuple of 
queues and sending the customer to a shortest one among those chosen, ties 
being split by always going to the first best queue in the list. Given that the 
next event is a potential departure, the departure queue is chosen uniformly 
at random among the n possible queues, and departures from empty queues 
are ignored. In the Markov chain graph, two states are connected by an 
edge if and only if they differ exactly in one customer in one queue. Then a 
function / is 1-Lipschitz if and only if it is 1-Lipschitz with respect to the 
t\ distance on the state space S. 

We focus on the case d > 2. For d = 1, in equilibrium the queue lengths 
are independent geometric random variables, so normal concentration of 
measure can be obtained using the standard bounded differences inequal- 
ity 0. 

By Lemma 2.3 in [18J], for all x,y G S such that d(x, y) = 1, and all t > 0, 

Let c be a positive constant, and let So be given by 

So = {x £ S :]] x ||i< cn, \\ x ||oo< clogn}. 

It is very easy to modify the proof of Lemma 2.6 in [18] to show that, if 
x, y £ So and d{x, y) = 1, then for some constants a, (3 > 0, 

d^{8 x P\ dyP 1 ) < e-W" 1 + 2e _/3n (4.15) 

for t > an log n. 
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Take a constant K > 2 and let r = n . Then we can put on = 1 for 
t < an log n, and «j = e~^ t//n + 2e~^ n for an log n < t < t. Then for t < r, 
we can upper bound 

t 

^ a- < min{i, an log n + n 1 "^"/?" 1 + 2e~ /3n/2 } < min{t, 2an log n}. 
i=i 

Consider the all-empty state, 0g5q. Then by choosing the constant c in 
the definition of So sufficiently large, we can ensure that, for d > 2, 

P (^i G S^ V t < t) > 1 - e -(log") 2 /c. 

This follows from Lemma 2.3 (monotone coupling for given n and d), Lemma 
2.4 (a) and the monotone coupling for given n and different d, d' (see the 
proof of Lemma 2.4 in [l^]) and equation (37) in [18]. (See also the state- 
ments of these results in Section l3~2l ) 

By Theorem l4.1l (i). we can choose c sufficiently large so that, for all t > 0, 
all u > 0, and every Lipschitz function /, 

F So (\f(X t ) - K So [f(Xt)]\ >u)< 2e~ u2 / ct . (4.16) 
By Theorem 14.11 (ii), for an log re < t < r, and all u > 0, 

P 5o (|/(X t ) - E«5 [/(lt)]| > U) < 2e -« 2 /-logn + e -(Iogn)Vc. (417) 
In particular, for an log re < t < r, and re < coy^logn, 

P 5o (|/(X t ) - E 5o [/(X*)]| > U) < 2e -« 2 /-logn ) (418) 

provided that c is large enough. Inequalities (|4.16j) - (|4.18j) improve on 
what one could obtain for the jump chain from Lemma 13.51 above, for an 
interesting range of u and t - and it is easy to use them to derive improved 
concentration of measure inequalities for the continuous chain also. (It is 
possible to optimise inequality (|4.17|) by playing with the definition of So to 
obtain normal concentration for larger u.) 

We now want to relate this to concentration of measure in equilibrium, 
via Corollary 14.21 It is easy to see from earlier work (see [lH ] and references 
therein) that the supermarket jump chain has a unique stationary measure. 
(This could also be proven showing that the hypotheses of Corollary 14.21 (i) 
are satisfied, via (I4.15P a bove.) 

By Lemma 2.1 in [18| and straightforward calculations for the Poisson 
process, there is a constant n > such that 

d w (£(X t ,0),n) < ne~ vt/n + 2cnF jr (M > nt/n) + 2e _J?n , (4.19) 

where M denotes the maximum queue length in equilibrium, and we may 
take c the same as in the definition of So, assuming that c is sufficiently 
large. Thus, by dUSJ, 



dw(C(X T ,0),Tt) < (n + 2cn + 2)e- vn . 
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Let Y denote the queue lengths vector in equilibrium. It then follows by 
Corollary 14.21 (iii). uniformly for all 1-Lipschitz functions /, for it > 1 and n 
sufficiently large 

P # (|/(Y) - E*[/(y)]| > 2u) < 2e- u ' 2/cnlosn + 2e- (logn)2/c . (4.20) 
So, choosing c to be sufficiently large, for all u > and n sufficiently large, 

P*(|/(Y) - Efr[/(Y)]| > 2n) < ce -« 2 /cnlogn + ce -(logn) 2 / C _ ^_ 21 j 

This improves on Lemma 13.61 above, and gives normal concentration for 
u = 0(n 1 / 2 (log ra) 3 / 2 ) (again, it is possible to obtain normal concentration 
for larger u), but is not the optimal result we are after. In particular, we still 
cannot show that deviations of size n l l 2 uj{n) have probability tending to 
for uj(n) tending to infinity arbitrarily slowly. We will now derive another 
inequality that will enable us to achieve our aim. 

Theorem 4.5. Assume that there exists a set So and numbers cti(x,y) 
(x,y £ So, i G'N) such that, for all i, and all x,y £ So with d(x,y) = 1, 

dwOW* V*) <Oi(x,y). (4.22) 

Let 

Sq = {x G Sq '■ y £ Sq whenever d(x, y) = 1}. 

For x £ S, let g x (y) = d^{5 y P l ,5 X P 1 ) 2 . Assume that, for some sequence 
(oci :i£N) of positive constants, 

sup (Pg X0 )(x ) < a 2 . (4.23) 
x es° 

Let t > 0, let v = Yll=i a h an ^ let 

a = sup sup otj(x,y). (4.24) 
i<j<tx,yeS :dtx,y)<2 

Let also A t = {uj : X s {uj) £ Sq : < s < t}. 

Then, for all u > 0, and uniformly over all 1-Lipschitz functions f, 

P&o (\f(X t )-K Sxo [f(X t )}\ >uDA t )< 2e -« 2 /(4,(i+(W6,)). (4 .25) 



To prove Theorem 14.51 we use another result from 26J]. With notation as 
before, for j = 1, . . . , t, let 



var 



Zj | Tj-x) = E (Zj - E(Zj | ^_!)) 2 | Tj-x ; 



let V = Y2j=i var j- Also, for each such j, let devj = sup(|Zj — Zj-\\ \ Tj-x), 
and let dev = sup^devj. The following result is essentially Theorem 3.15 
in [IF 
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Lemma 4.6. Let Z be a random variable on a probability space P) 
with E(Z) = m. Let {0, Q} = J-q C iF\ C ... Cfj fe a filtration in T . 
Let b = maxdev, i/ie maximum conditional deviation (and assume that b is 
finite). Then for any u > 0, 

F(\Z - m\ > u) < 2 e -« 2 /(2£(i+(W3ii)) ) 

where v is the maximum sum of conditional variances (which is assumed to 
be finite). 

More generally, for any u > and any values b, v > 0, 
F({\Z - m\ > u} n {V < v} n {maxdev < b}) < 2 e~ u2 ^ 2v(l+{bu ^ . 

Proof of Theorem 14.51 The proof is similar to the proof of Theorem 14.11 
Let / : S — > M be 1-Lipschitz. Fix a time t G N, an xq G S and consider 
the evolution of Xt conditional on Xq = xq for t steps, that is until time 
t. Again this conditional process can be supported by a finite probability 
space (n,T,F Sxo ). 

As before, in the conditional space, for each time j = 0, . . . , t let jFj = 
a(Xo, . . . , Xj), the cr-field generated by Xq, . . . ,Xj; so Tq = {0,^2} and 
Tt = T . Again, we consider the random variable Z = f{Xt) : ^ — > K. And, 
for j = 0, . . . , t, Zj is given by 

Zj = nf(X t )\^} = E Sxq [f(X t )\X , ...,X,} = (P^f^Xj). 

Suppose first for simplicity that So = S. We want to apply Lemma I4T61 and 
for this we need to calculate the conditional variances var^. To do this, we 
use the fact that the variance of a random variable Y is equal to 5 F,(Y — Y) 2 , 
where Y is another random variable with the same distribution as Y and 
independent of Y. 

Fix j and xi,..., Xj-i G S, and for x G S consider 

g{x) = E[f(X t )\X j = x]=E[f(X t „ j )\X = x] 

= (**-'/)(*). 

Then, for Co such that Xj-x(u) = Xj-i, Zj(Co) G {g(x) : d(x,Xj-\) < 1}, 
so that 



= \z^ p { x j~^x)P{x j - l ,y){g(x) -g{y)f 

x,y 

< \ P(x j ^,x)P(x j ^,y)d w (5 x P t -\5 y P t - j ) 2 

x,y:d(xj-i,x)<l,d(xj-i,y)<l 

< 2 P(x J -i,x)d w (5 x P t -\5 Xj _ 1 P t ^) 2 

x:d{xj—\ ,x)<l 

< 2 V] P(xj-i, x)a t -j (xj-t,x) 2 
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by assumption (|4,23l) . 

Then we can upper bound the sum 

t 

i=i 

It remains to bound dev = sup.,- dev.,-. For Co such that X,_i(lD) = Xj—i, 
devj(u) < sup \g(x) - (P^' +1 /)(^-i)| 

x:d(x,Xj_i)<l 

sup |(P*-V)(^)-(^' +1 /)(^-i)l 

x:d(a;,a;j_i)<l 

< sup \<hr(6 x P t -l,6 Xj _ l P t ^ +1 ). 

It follows that, for each j = 1, . . . , t, 

devj < sup d w {5 x P t ~ j+1 ,5 y P t - j ) 

x,y:d(x,y)<l 

< sup dwdS^pt-^SyPt-i) 

x,y:d(x,y)<2 

< a, 

by ()4.24p and using the coupling characterisation of the Wasserstein distance. 
Theorem 14.51 now follows from the first statement in Lemma 14.61 in the case 
where So = S. In general, the above bounds on v and dev hold on the event 
At = {to : Xj(uj) G Sq for j = 0, . . . ,t}, and so Theorem 14.51 also follows 
from the second statement of Lemma 14.61 I 



Let us now apply Theorem 14.51 to the supermarket model from 18|] dis- 
cussed above. Again, we focus on the case d>2. 
Let c be a positive constant, and let So be given by 

n 

{x £ S : £(k, x) = y~] l x ( r )>k < ne~ k/c for k = 1, . . .}. 

r=l 

Consider the all-empty state, G S®. Let K > 2 be a constant. We claim 
that we can choose c sufficiently large that, if r = n K , then 

F (X t e S$ : t < t) > 1 - e - {losn)2/c . 

This follows easily from Lemma 13.81 in the present paper, together with 
equation A3. 13j) . 

We now want to calculate the quantity in (I4,23h , For a state xq G Sq 
and a state x chosen with probability P(xq,x), these states will only differ 
in a queue of length greater than k if P(xo, x) is a probability of an event 
involving a queue of length at least k - a departure from a queue of length 
at least k or an arrival into a queue of length at least k. For xq G S® such a 
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transition happens with probability at most ce~ k l c (choosing c large enough 
again) . 

The proof of Lemma 2.6 in [l^] shows that, if x, y € So are adjacent and 
differ in a queue of length k, then for some constants a, (3 > we can upper 
bound 

dw^P^SyP 1 ) < e-P t/n + 2e-^" 
for t > akn. Also, by Lemma 2.3 in [l~8l ]. 

dwtfxP^dyP*) < 1 

for all i and hence for t < afcn. 

Combining the above observations and choosing a > 1 large enough, we 
find that for t > a 2 n 

sup K 5xQ d^^x.P'.^P 1 ) 2 < e- l ' an + e~ n ' a . 
Hence, by choosing c large enough, we can upper bound 



i=l 



Further, once again using Lemma 2.3 in 18], we can upper bound a < 2. 

By Theorem 14.51 there is a constant c > such that, uniformly for all 
1-Lipschitz functions /, all t < r, and all it > 0, 

P* (|/(X t ) - E 5o [/(l t )]| > u) < 2e- u2 / 4c (" +u ) + e -( lo g«) 2 / C . (4.26) 

In particular, we can choose c large enough so that, for u < CQyJnlogn, 

F So (\f(X t )-E 5o [f(X t )}\ >u)< 3e-« 2 / cn . (4.27) 

Now, as before, by (|4.19j) . 

dw(S P T , ir)<{n + 2cn + 2)e~ r]n 

provided c is large enough. It follows that for n large enough, uniformly for 
all 1-Lipschitz functions /, and all u > 1, 

P*(|/(Y)-E # [/(y)]| > 2u) < F So (\f(X T )-E 5o [f(X T )}\ > u) 

+ (n + 2cn + 2)e^ n 
< 2e" u2/4c(n+u) + 2e" (logn)2/c (4.28) 
It follows that, for < u < con 1 / 2 logn, we obtain 

P*(|/(t)-E*[/(y)]|>2u) < ce"" 2 /-, (4.29) 

provided that the constant c is chosen sufficiently large. Choosing u = 
y/nu(n), where u)(n) is a function tending to infinity with n arbitrarily 
slowly, we obtain 

Ft(\f(Y)-Et[f(Y)]\>u) = o(l) 

as n — > oo. 
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Inequalities (j4.26j) and (I4.28|) could be optimised (by optimising the choice 
of set So) to obtain normal concentration for larger u. 

For a positive integer k, let £(k, Y) be the number of queues of length at 
least k in the stationary jump chain, and let £{k) be its expectation. Then 
for any positive integer s, and any u > 0, we can write 

E*[\£(k, Y) - £(k)\ s ] <u s + Y^ V s ' 1 Pfr(|*(*, Y) - £(k)\ > y). 

y>u 

Note that the maximum value that \£(k,Y) — £{k)\ s can take is n s . Then, 
taking u = n 1//2 , and applying inequality (14.28p . we obtain 

E#[\£(k,Y) -i(k)\ s ] < en 8 ' 2 . 

assuming the constant c is chosen big enough. Hence, arguing as in Section 4 
of la], it is easy to show that 



suv\E[£(k,Y) r -£(k) r \ =0{n 

k 



r-l\ 



And hence, arguing as in Section 5 of |18|], we obtain that, for some constant 
sup |n _1 £(*) - A 1+d+ - +dl_1 1 < con' 1 , (4.30) 



i 



which improves on equation (27) in [18|], implying that 

supln" 1 ^) - A 1+d +-+ rfl ~ 1 | < con-^logn) 2 . 



5. Conclusions 

We have derived concentration inequalities for Lipschitz functions of a 
Markov chain long-term and in equilibrium, depending on contractivity 
properties of the chain in question. Our results apply to many natural 
Markov chains in computer science and statistical mechanics. 

One open problem is to show that, in a discrete-time Markov chain with 
'local' transitions, under suitable conditions, rapid mixing occurs essentially 
if and only if there is normal concentration of measure long-term and in equi- 
librium (with non-trivial bounds). Another open question is to explore how 
these properties relate to the cut-off phenomenon. Is it the case that, again 
under suitable assumptions, they are necessary and sufficient conditions for 
a cut-off to occur? 
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